SQEL: a multilingual and multifunctional dialogue system
نویسندگان
چکیده
Within the EC-funded project SQEL, the German EVAR spoken dialogue system has been extended with respect to multilinguality and multifunctionality. The current demonstrator can handle four di erent languages and domains: German, Slovak, and Czech (and their national train connections), and Slovenian (European ights). The SQEL demonstrator can also access databases on the WWW, which enables users without an internet connection to meet their information needs by just using the phone. The system starts up with a German opening phrase and the user is free to use any of the implemented languages. A multilingual word recognizer implicitly identi es the language, which is then associated with the appropriate domain and database. For the remainder of the dialogue, the corresponding monolingual recognizer is used instead. Experiments to date have shown that the multilingual and the (respective) monolingual recognizers attain comparable word accuracy rates, although the former is less e cient. The existence of language-independent task parameters, such as goal and source location, has meant that porting the system to a new language involves mainly the development of lexica and grammars (apart from the word recognizers) and not an extensive restructuring of the interpretation process within the Dialogue Manager. The latter is su ciently exible to switch between the di erent domains and languages. 1. The EVAR Dialogue System The spoken dialogue system EVAR has been connected to the German public telephone network since 1994 to answer enquiries on German InterCity train connections [3, 2]. One of the ambitions regarding EVAR has been to render it multifunctional. The application should be generalised to cover not just train connections, but also other means of transport, as well as hotel and holiday reservations. The rst step towards this direction has been the development of the SQEL demonstrator, which covers multiple languages and domains. In Section 2, the multilingual recognizer and the multifunctional dialogue manager are Erkennen Verstehen Antworten R uckfragen (Recognize, Understand, Reply, Ask back) described, including preliminary results with the former. Then in Section 3, the connection of the system to the World-Wide-Web is explained. 2. Multilinguality and Multifunctionality The multifunctionality of the EVAR system was tested in the framework of the EC-funded Copernicus project COP1634 SQEL (Spoken Queries in European Languages) [6]. The goal was partly to enhance the functionality of the system with regards to a number of domains, namely ight and train information. The main aim, however, was to achieve multilinguality for EVAR, that is the system should be capable of operating across the German, Slovak, Slovenian, and Czech languages. The core of this research has been the development of a multilingual word recognizer (Section 2.1) and the extension of the already exible Dialogue Manager of EVAR (Section 2.2), giving rise to the SQEL demonstrator. 2.1. Speech Recognition One of the major tasks of a multilingual dialogue system is the recognition of the user utterances. Inside the SQEL system, this is done by a multilingual Speech Recognizer (SR). One method to perform multilingual speech recognition is to run all existing recognizers in parallel and choose the most probable word chain. To reduce the computational load, a single recognizer was built instead that contains the words from all languages in its dictionary. The basis for our multilingual SR is a series of monolingual SRs. Semi-continuous HMMs are used for acoustic modelling and bigrams for linguistic modelling. The monolingual recognizers are trained in the ISADORA environment which uses polyphones with maximum context as subword units [5]. The development of the multilingual SR involved the following steps: 1. The number of codebook density functions was increased to re ect the language-dependent codebooks. In the case of two languages, for example, with a codebook of 256 density functions for each, the multilingual recognizer would have 512 density functions. 5th International Conference on Spoken Language Processing (ICSLP 98) Sydney, Australia November 30 -December 4, 1998 ISCA Archive http://www.isca-speech.org/archive 2. Special weight coe cients were added to the HMM output density functions to re ect the increased number of available density functions. The new weight coe cients were set to zero, so that every density function belonging to di erent languages bears no e ect on the output probability of the HMM. 3. A special bigram model was constructed which consists of the monolingual bigrams and does not allow any transitions between languages, as shown in Equation 1. P (wordlanguagei jwordlanguagej ) = 0 for i 6= j (1) 4. A special silence category was established for language-speci c silence models, which allows transition to and from every language, so that the language can be switched by means of inserting pauses. In order to re ect the quality of the acoustic models for the di erent languages, an additional a priori value was introduced for each language. In theory, there will only be word chains in the spoken language after a few seconds, using the standard beam search in forward decoding. The e ect is that the number of words inside the active vocabulary will be the same as when using the respective monolingual recognizers. Experimental Results Our approach to multilingual speech recognition has been evaluated with the four languages of the SQEL project; German, Slovenian, Slovak and Czech. Because of the special silence category used, the recognized word chain can contain words from di erent languages. In order to assess the accuracy, the language of the word chain is determined on the basis of the number of words in each language, selecting the one with the most words. All words found in other languages are deleted from the recognized word chain, each one counting as a deletion error. In the context of a dialogue system, only the rst user utterance will be processed by the multilingual SR. The language identi ed at that point will be adopted for the whole of the remaining dialogue, which involves the use of a monolingual SR. As shown in Table 1, the monolingual SRs are still superior to the multilingual SR, because of the instances of language identi cation failure salient in the latter. These failure instances occur especially within short sentences, as the time available for a robust discrimination between languages is insu cient in these cases (Table 2). In evaluating the monoand the multilingual SRs on utterances with more than 5 words, there are only slight di erences in the corresponding word accuracy rates, but the language identi cation rates are signi cantly higher. The Real Time Factor (RTF) for the multilingual system is more than two times higher than for monolingual recognizers with the language already established. However, the multilingual system is nearly twice as fast as using 4 monolingual recognizers in parallel. The reason for this is that, at the beginning, all possible languages are inside the beam and Recognition Rates R Recognizer (Word Accuracy) T SloveSlovak Czech German F nian Mono 88% 1 Slovenian (90%) Mono 88% 1 Slovak (88%) Mono 84% 1.3 Czech (83%) Mono 90% 1.2 German (91%) Multi 83% 86% 84% 84% 2.5 (87%) (85%) (83%) (86%) Table 1: Recognition rates and Real Time Factor (RTF) using monolingual and multilingual speech recognizers on all sentences of the SQEL test corpus; the recognition rates for sentences longer than 5 words are shown in brackets. Test Set SloveSlovak Czech German
منابع مشابه
A Multilingual Dialogue System for Accessing the Web
In this paper we propose the use of multilingual multichannel dialogue systems to improve the usability of web contents. In order to improve both the communication and the portability of those dialogue systems we propose the separation of the general components from the application-specific, language-specific and channel-specific aspects. This paper describes the multilingual dialogue system fo...
متن کاملTowards Development of Multilingual Spoken Dialogue Systems
Developing multilingual dialogue systems brings up various challenges. Among them development of natural language understanding and generation components, with a focus on creating new language parts as rapidly as possible. Another challenge is to ensure compatibility between the different language specific components during maintenance and ongoing development of the system. We describe our expe...
متن کاملThe Erlangen Spoken Dialogue System EVAR : A State { of { the { ArtInformation Retrieval
In this paper, we present an overview of the spoken dialogue system EVAR that was developed at the University of Erlangen. In January 1994, it became accessible over telephone line and could answer inquiries in the German language about German InterCity train connections. It has since been continuously improved and extended, including some unique features, such as the processing of out{of{vocab...
متن کاملAn Overview of the Slovenian Spoken Dialog System
In the paper we present the modules of the Slovenian spoken dialog system, developed within the joint project in multilingual speech recognition and understanding “Spoken Queries in European Languages” SQEL-Copernicus-1634 . The system can handle spontaneous speech and provide the user with correct information in the domain of air flight information retrieval. The major modules of the system pe...
متن کاملA Multilingual Spoken Dialog System
This paper will briefly introduce MSDSKIT-1 (Multilingual Spoken Dialogue System Version 1.0 developed by Kyoto Institute of Technology) which integrates Japanese and Chinese now. It is a promotion vision of the SDSKIT-3 (Spoken Dialogue System in Japanese). This system can provide services such as sight-seeing introduction, traffic guidance, hotel reservation. A user can also plan his itinerar...
متن کامل